Calculate distances using address and Bing Map
by Zongyan Wang
Posted on July 26, 2018
Imagine you want to compare the following distance:
1. Times Square, New York to Disney Resorts, Orlando
2. You and the girl/boy you are chasing for.
The following article can help you to calculate the first distance by Python.
What you need:
Python --3.7
geocoder --1.38.1
pandas --0.23.3
How can I check my python packages version?
For example if you want to find your pandas version. Open a python interface and do.
>>> import pandas >>> pandas.__version__ '0.23.3'
Now here is your data.
df = pd.DataFrame({'A_address': ['Times Square',], 'A_city': ['Manhattan', ], 'A_state': ['NY', ], 'B_address': ['Walt Disney World Resort', ], 'B_city': ['Orlando',], 'B_state':['FL']}, index = range(1))
What we will do the next is:
1. Get the lat, lng information for A and B.
2. Calculate the distance with the lat, lng.
To get the lat, lng. First we need to concatenate the address into full address. Use:
df['A_full_address'] = ["%s, %s %s"%(addr, city, state) for addr, city, state in zip(df.A_address, df.A_city, df.A_state)]
df['B_full_address'] = ["%s, %s %s"%(addr, city, state) for addr, city, state in zip(df.B_address, df.B_city, df.B_state)]
Then, calculate the lattitude and longitude using geocoder.
To run the following code, you need to get your bing map key first. And be careful, don't waste your money on duplicates. Check your number of unique addresses first if you have multiple rows.
# if you have duplicates # df_a = df.loc[:, [x for x in df.columns if x[0] == 'A']] ## de-duplicate # df_a = df_a.groupby('A_full_address').first().reset_index(drop=False) for i in df.index: g = geocoder.bing(df.loc[i, 'A_full_address'], key = bing_key) df.loc[i, 'A_lng'] = g.lng df.loc[i, 'A_lat'] = g.lat
Or you may use pandas apply method.
def get_loc(obj): g = geocoder.bing(obj, key = bing_key) return g.lng, g.lat df[['B_lng', 'B_lat']] = df.apply(lambda obj: pd.Series( dict(zip(['B_lng', 'B_lat'], get_loc(obj['B_full_address'])))), axis = 1)
The haversine function is credit to https://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points. Here is the function:
from math import radians, cos, sin, asin, sqrt def haversine(lon1, lat1, lon2, lat2): """ Calculate the great circle distance between two points on the earth (specified in decimal degrees) """ # convert decimal degrees to radians lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2]) # haversine formula dlon = lon2 - lon1 dlat = lat2 - lat1 a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2 c = 2 * asin(sqrt(a)) km = 6367 * c mile = km * 0.621371 return "%.2f"%mile
And you could finally calculate the distance using haversine function defined above.
df['distance_in_miles'] = df.apply(lambda obj: haversine(*obj[['A_lng', 'A_lat', 'B_lng', 'B_lat']]), axis = 1)
Your final output will look something like this
A_address | A_city | A_state | B_address | B_city | B_state | A_full_address | B_full_address | A_lng | A_lat | B_lng | B_lat | distance_in_miles | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Times Square | Manhattan | NY | Walt Disney World Resort | Orlando | FL | Times Square, Manhattan NY | Walt Disney World Resort, Orlando FL | -73.966248 | 40.783436 | -81.582626 | 28.403811 | 957.21 |